newzmat

The newzmat utility was designed primarily for converting molecule specifications between a variety of standard formats. It can also perform many related functions, such as extracting molecule specifications from Gaussian checkpoint files. Its full set of capabilities includes the following:

COMMAND SYNTAX

newzmat has the following general syntax:

newzmat option(s) input-file output-file

where option(s) is one or more options, specifying the desired operations, input-file is the file containing the structure to be converted (or retrieved), and output-file is the file in which to place the new molecule specification (or Gaussian input). A slightly variant syntax is used when merging information from two files (see below).

If the output filename is omitted, it is given the same base name as the input file, along with a conventional extension denoting its file type. In general, extensions can be omitted from file specifications provided that extension conventions are followed. The default extensions are listed in the following table:

Extension     Description    Option Form
.bgf     Biograf internal data file    bgf
.cac     CaChe molecule file    cache
.chk     Gaussian 09 checkpoint file    chk
.com     Gaussian input file (Z-matrix mol. spec.)    zmat
.com     Gaussian input file (Cart. coords. mol. spec.)    cart
.con     QUIPU system data file    con
.dat     Model/XModel/MM2 data file    model
.dat     MacroModel data file (may be formatted or unformatted)    mmodel, ummodel
.ent     Brookhaven data file (≡PDB)    ent
.com     Fractional coords. for crystal structures (requires exactly 3 trans. ops.)    fract
.inp     MOPAC input file    mopac
.pdb     Protein Data Bank format (≡Brookhaven)    pdb
.ppp     Some PPP program (output only)    ppp
.xyz     Unadorned Cartesian coordinates    xyz
.zin     Ancient version of ZIndo    zindo

INPUT AND OUTPUT OPTIONS

The options specifying the formats of the input and output molecule specifications are formed from the string -i or -o (respectively), followed immediately by the appropriate option form string from the preceding table corresponding to the desired molecule specification format (no spaces intervene). For example, -ipdb indicates that the input molecule specification is in PDB format and that the extension .pdb should be applied to the input filename if no extension is specified. Similarly, -oxyz specifies an output format of Cartesian coordinates along with a default extension of .xyz for the output filename. The default input and output options are -izmat and -ozmat. Note that -izmat and -icart are synonyms, and either one of them can read a Gaussian input file containing any molecule specification format: Z-matrix, Cartesian coordinates, or mixed internal and Cartesian coordinates.

SELECTING AN OUTPUT FORMAT FOR DATA INTERCHANGE

In order to communicate with a non-supported visualization system, the first choice of format to try is the PDB file. This format includes the connectivity information and is widely supported. Note that some software packages use the .ent extension, rather than .pdb; the -ient and -oent options select the former, while -ipdb and -opdb select the latter. Another commonly used alternative is the Mopac file format.

OTHER OPTIONS RELATED TO INPUT AND OUTPUT

The following options further specify the input for newzmat:

-ngeom N
Use geometry from Nth structure on the checkpoint file. This functions in the same manner as Geom=(NGeom=N).

-ot list
Use geometries from the listed structures on the checkpoint file. Lists can include multiple structure numbers (separated by commas) and ranges of structure numbers. For example, -ot 3,7 extracts structures from steps 3 and 7, and -ot 2-5 extracts all structures ranging from steps 2 through 5.

-step N
Use the structure from step N of the geometry optimization data in a Gaussian 09 checkpoint file (valid only for the -ichk input option).

This option is not available for optimizations in redundant internal coordinates (the default coordinate system). Instead, retrieve the structure from the checkpoint file in a subsequent job by using a route section containing Geom=(Check,Step=N).

-ubohr
Input distances in input file are specified in Bohr (the default is Angstroms).

-urad
Input angles in input stream are specified in radians (the default is degrees).

The following options retrieve changes from the checkpoint file and apply them as the MM charges for both regular atoms and link atoms. These options specify which kind of charges to retrieve:

-qmul
Mulliken charges.

-qesp
ESP-fit charges.

-qaim
AIM charges.

-qnpa
NPA charges.

-qapt
APT charges.

The following options further specify the output file format:

-mof1
Use macromodel format 1 (only valid with -ommodel).

-mof2
Use macromodel format 2 (this is default if -ommodel is specified).

-optprompt
Prompt for which parameters should be optimized; used when setting up a molecule specification destined for a geometry optimization and -ozmat is specified (or no output option is included). By default, all parameters not fixed by symmetry are optimized.

-prompt
Prompt for route section and title section lines and for the charge and multiplicity when using -ozmat (or no output option is specified). Gaussian input files produced by newzmat set up HF/6-31G(d) single point energy calculations by default.

EXAMPLES

The following command reads the molecule specification from the PDB file water.pdb and writes a Gaussian input file, including the equivalent Z-matrix, to the file h2o.com:

$ newzmat -ipdb water h2o        -ozmat is the default, so it can be omitted.
 Charge and multiplicity [0,1]?  A return accepts the default values shown.

newzmat prompts for the charge and multiplicity for the Z-matrix since these items cannot be determined from the PDB file.

The following command reads the molecule specification from the Gaussian 09 checkpoint file job-11234.chk and writes the PDB file propell.pdb:

$ newzmat -ichk -opdb job-11234 propell

The following command reads the molecule specification from step 5 of the optimization from the checkpoint file newopt.chk and produces the Mopac file step5.inp:

$ newzmat -ichk -omopac -step 5 newopt step5

The following command reads the molecule specification from the Mopac file newsalt.inp and writes a Gaussian input file including the equivalent Z-matrix to the file newsalt.com, prompting for the route and title sections and the charge and spin multiplicity for the molecule:

$ newzmat -imopac -prompt newsalt
Percent or Route card? # B3LYP/6-31G(d,p) Opt 
Route card?                        End route section with a blank line.
Titles? Optimization of caffeine at B3LYP/6-31G**
Titles?                            End title section with a blank line.
Charge and Multiplicity? 0,1

MERGING INFORMATION FROM TWO FILES USING NEWZMAT

newzmat can create an input file where data from two different files have been combined. This feature is helpful for setting up ONIOM calcuations in which one wants to apply a custom setup (ONIOM layers, MM atom types, MM atom charges, etc.) from one file to another structure on a different file. It allows you to specify the custom setup only once, and then later apply that information to other structures.

The -t and -s options request and control the creation of the merged input file. A merge command has the following general form:

$ newzmat -itype [-otype] -tformat -sitemn files

The command requires that you specify the locations of the existing input file (input), the new input file you are creating (output), and the template file (template). The order of the input, output and template files on the command line varies depending on whether the input and/or template file(s) are Gaussian checkpoint files:

Is checkpoint file:  files arguments
input? template?
n n input output ignore template
n y input output template
y n ignore output input template
y y input output template

In the preceding table, "ignore" is a placeholder.

-t requires a file type argument like -i and -o, and it accepts the same values.

The -s option, which can be specified multiple times, indicates the various information for the new input file. It accepts the following keywords, each of which is followed by either 1 or 2, which indicates that the item should be taken from the input or template file (respectively):

XYZn   Geometry (coordinates, nuclear charges, etc.).
MMTn   MM types.
MMCn   MM charges. If copies, also copied to link atoms if there is ONIOM data present.
PDBn   PDB information (including secondary structure).
Conn   Connectivity.
Onin   ONIOM layer and ink atom data.
Micn   MicOpt (freezing/optimizing atoms and rigid block info).

The default for all items is the input file (i.e., file 1).

Here is an example. We have a Gaussian input file, allsetup.gjf, containing the MM atom type, MM charges, PDB information, connectivity, ONIOM layers, and so on. We can apply all of these settings to the structure in the PDB file new.pdb with the following command:

$ newzmat -icart -tpdb -sXYZ2 -sCon2 allsetup.gjf newinput.gjf ignore new.pdb

This command takes only the geometry and connectivity from file 2 — the template file new.pdb — and everything else (MM atom types, MM charges, PDB information ONIOM data and optimize/freeze atom data) from file 1, the input file allsetup.gjf. The output file, newinput.gjf, will be the result of this merge.

OTHER NEWZMAT OPTIONS

The other options to newzmat are concerned with generating connectivity information, with the use of standard geometrical parameters, and with the determination and use of molecular symmetry. A complete connectivity table can be used to generate Z-matrix specifications suitable for inclusion of symmetry constraints. Such a table is also required for output of the data files for the molecular mechanics programs. If one of the input formats which includes full connectivity is used (e.g., MacroModel data files), the connectivity that it provides is used.

However, when Z-matrix or MOPAC format input is provided, only the connectivity information which is implied by the internal coordinate specification is available. Thus if a new Z-matrix which incorporates the molecular symmetry is to be generated, the remaining connectivity information must be generated. When Cartesian coordinates are read in, naturally, no connectivity information is provided, so the default is to generate the table using the internally stored atomic radii. In addition, when used to generate input structures, the mechanics programs may not generate suitable bond distances and often produce coordinates which are close to but not exactly symmetric. Options control how each of these cases is handled.

-allbonded
In generating new connectivity information, assume all atoms are bonded.

-bmodel
Use standard model B bond lengths along with internal values in determining bond distances.

-density N
Generate natural orbitals for density number N. This option is only useful if you are generating a CaChe file. N should be set to 0 for HF, to 2 for MP2, to 6 for CI, and to 7 for QCISD or CCD.

-fudge
Fudge bond distances to make sure they are reasonable, using internal values. This is the default for model input and is not applicable elsewhere.

-gencon
Generate connectivity information using internal radii.

-getfile
Insist on filename specifications for all arguments, making standard input and output unacceptable.

-lsymm
Use loose cutoffs for determining symmetry. This option implies -symav.

-mdensity M
Subtract generalized density M from that specified with -density to make a difference density, which is then converted to natural orbitals.

-nofudge
Do not fudge bond distances. This is the default and only choice for all cases except model input.

-nogetfile
Cancels -getfile.

-noround
Turns off rounding of Z-matrix parameters.

-nosymav
Turns off averaging of input coordinates.

-nosymm
Turns off all use of symmetry.

-order
Keeps the order of atoms as close as possible to the input order.

-round
Rounds Z-matrix parameters to 0.01 Å and 1 degree.

-symav
Average input coordinates using approximate symmetry operations to achieve exact symmetry.

-symm
Assign molecular symmetry.

-tsymm
Use tight cutoffs for determining symmetry. The option is the default.

-rebuildzmat
Build a new Z-matrix rather than using the read-in one (as would be the default for Z-matrix or MOPAC input). This option implies -gencon, and the option may be abbreviated as -redoz.

KNOWN DIFFICULTIES WITH NEWZMAT

 


Last update: 5 June 2013