• Viewing a file with UNICODE text strings

    Author
    Topic
    #383738

    I have some files which are NOT Unicode files, according to the strict definition thereof, but which contain large numbers of Unicode text strings. This text is very difficult to read in the usual text viewers, like NOTEPAD, LIST, V, etc, because of the blank character (really null, but it looks blank!) which follows every ASCII text character, and the text is spread out twice as far as it need be, so recognising words, etc, is v. hard.

    Is anyone aware of a file viewer which would display the file, but with the null/blank characters omitted from the display, so that
    W o o d y ‘ s

    Viewing 0 reply threads
    Author
    Replies
    • #655364

      Probably a silly idea, but it is Friday.

      They won’t open in Word, will they?

      • #655367

        Not a silly idea at all! They will open, but instead of the nice (?) blank between characters you get the “nasty open square box”, which improves matters negatively…! bummer

        • #655368

          So it would be equally silly to copy and past the “nasty open square box” into the Find and Replace (with nothing) window?

          • #655369

            Nice try, but no cigar. It replaced a (very) few square boxes, but made no difference to the Unicode text strings. There would need to be a better way of spcifying the 0x00 character in Search-and-Replace.

            SLIGHTLY LATER…

            I think I must be having Brane Fade, for I’ve just tried it (again, I thought!) in Vernon Buerg’s LIST — and it drops the binary zero characters, and thus gives me exactly what I was after.

            Time to go home for a lie-down, I think! blush

            Thanks for the ideas!

            • #656410

              Having had the weekend off and a day in London, I return and find some more files of the type I’m trying to view — and this time even LIST displays the Unicode text strings with a blank/null between each character.

              So now I am back at square one, still needing a program that will close up the Unicode text strings when viewed…

            • #656413

              Maybe you could attach an extract from one or two of the worst offenders? It might be easier to crack with something to play with directly. smile

            • #656415

              I’ve attached an example file of just over 4 K, slightly doctored to remove identifying information…

            • #656418

              Apparently, Internet Explorer does a good job of viewing Unicode text: the attached screenshot shows what your attachment looks like when I open it in the browser.

            • #656419

              I found opening in Notepad, Selecting All, Copying, then pasting into Word gave something relatively legible.

              (Interestingly, whilst in Word, I tried a search and replace on one arbitary character. 2,000+ characters were removed with no discernible change in appearance!)

            • #656422

              The reply from Hans looks good to me – but when I try it, the file opens in NOTEPAD…

              Could you let me have the sequence of actions which would give the result you obtained, please?
              (I’m not used to these new-fangled GUI mechanisms…!) bwaaah

              Even when I change the extension, and try “Open with…” and choose “Internet Explorer”, it just pauses for a short time, and nothing happens.

            • #656424

              How about this?

            • #656423

              Is the attached what you were after? I opened your file in word, copied one of the “blanks” following an ASCII character, and did a search & replace all (with nothing). (Couldn’t get rid of the remaining boxes though.)

            • #656426

              Tim, that’s not bad at all! Why didn’t it work when either Leif or I tried it, then? [very puzzled indeed] (and no corresponding frownie!)

              I suppose the answer is really to get one of our programming chaps to read the file and strip out all characters less than blank, 0x20, and then view the resulting file in any available file viewer. But it would be so much easier to do it directly…

            • #656427

              OK, Leif – that’s v. good. How done?

            • #656436

              Open Word with a blank doc.
              Open file in Notepad, select all, copy and paste into Word.
              Delete contents of Notepad
              Select All in Word, copy and paste to Notepad.
              Copy contents of Notepad back to (new) Word doc.
              In Word, select a single space between characters, copy to Find and Replace and replace with (nothing). Select Replace All.
              Select all and copy back to Notepad for a finished .txt version.

              The multiple back-and-forth between Word and Notepad *may* be superfluous, but did seem to make a difference.

            • #656439

              Thanks, Leif. I’ve taken your method and removed all references to Word from it, so that I just open the file in NOTEPAD, do Edit => Replace, cut and paste one of the “inter-ASCII-characters” characters into the Find What box, then click on Replace All, then the window show a fascinating removal of large quantities of characters, and I end up with mostly ASCII characters with a few “open square boxes” separating them.

              I would point out that the above was tried on Windows XP, which MAY have something to do with anomalous results being obtained by various people, me included!!

              I think that’s close to the best I’m going to get without some programming, so I thank everyone VERY much for their help! clapping cheers

            • #656447

              Curiously it seems to work better doing it in Wordpad rather than notepad. All the spurious characters become underscores when copied to word and can be deleted in one hit smile this is in W2k and Offce 2k

            • #656451

              Well, I can’t get the “open square box” (probably 0x00) character to paste itself into the Find box in WordPad! So it seems I’m sticking with NotePad.

              {traditional sigh of exasperation ON}
              You’d think that Microsoft would have done something to make this easier…
              {traditional sigh of exasperation OFF}

            • #656473

              But the little box disappears when copying from Word to Notepad – perhaps because Word doesn’t ‘recognise’ it, it doesn’t get caught up in the copy…?

            • #656433

              Maybe it’s just this special relationship I’ve developed with my beast? grin I’ve curious too as to why it didn’t work – I just retried using a notmal “space”, with the same result. shrug

            • #657669

              The reason this is difficult to read in Notepad (at least as I have set it) is that the spaces are very wide using Courier font. If I change to Times and move my chair back a couple of feet, it looks just fine. smile

    Viewing 0 reply threads
    Reply To: Viewing a file with UNICODE text strings

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: