Reading file inspectors

You can use the inspector object encoding to specify an encoding to be used to read file in a relevance expression.

If you do not specify any encoding, the files are read in the local encoding. The encoding object is used to read a file as the following:

 file "filename" of encoding "encoding"

The encoding might be any name which ICU can recognize, such as ISO-8859-1, Shift_JIS, and UTF-8.

Using this encoding object, you can affect the behaviors and results of relevance expressions using the following objects:

Here some simple examples:

 (content of file "c:\aaa\bbb.txt" of encoding "Shift_JIS") 
     contains "??"  
     # Return if the word "??" is found in the file "c:\aaa\bbb.txt" that 
       is written in Shift_JIS
 
     line 3 of file "eee.txt" of folder "/ccc/ddd" of encoding "Windows-1252"  
     # Return the third line of the file "/ccc/ddd/eee.log" in Windows-1252
     
     lines of file "/fff/ggg.txt" of encoding "UTF8"
     Return the lines of the file "/fff/ggg.txt" in UTF8

     lines of file "/hhh/iii.txt" of encoding "ISO-8859-1"
     Return the lines of the file "/hhh/iii.txt" in ISO-8859-1

You can use the encoding object by adding it after the keywords listed below to create file objects:

  • file
  • folder
  • download file
  • download folder
  • find file <string> of <folder>
  • x32 file (Windows only)
  • x32 folder (Windows only)
  • x64 file (Windows only)
  • x64 folder (Windows only)
  • native file (Windows only)
  • native folder (Windows only)
  • symlink (Unix only)
  • hfs file (Mac only)
  • posix file (Mac only)
  • hfs folder (Mac only)
  • posix folder (Mac only)

The encoding object cannot be used with creation methods for Mac's special folders such as apple extras folder, or application support folder. For such folders, you can uses the folder object by specifying their paths.

Note: If you try to open a file with an encoding using the encoding object and the file has a BOM, the file is opened in the encoding indicating the BOM; that is, the specified encoding is ignored.
Note: If, for whichever reason, the BOM of the file does not reflect the encoding of its content, the file line inspector fails with the U_INVALID_CHAR_FOUND error.
Important:

The file objects must be evaluated as a property of the encoding object during its creation. You cannot specify any encoding to file objects which are already created in the relevance expression:

(file "aaa.txt" of folder "c:\test") of encoding "Windows-1252"  
    # Not work. The encoding will be ignored.

In this relevance expression, the file C:\test\aaa.txt is read in the local encoding, not the Windows-1252 encoding, because a file object representing C:\test\aaa.txt is created first with the expression enclosed parenthesis and the subsequent encoding expression is ignored.

In the following expression the file C:\test\aaa.txt is read in the Windows-1252 encoding:
 file "aaa.txt" of folder "c:\test" of encoding "Windows-1252"