Kelv's Random Collection

A random collection of my contributions to the world.

Archive for the ‘Synalyze It!’ Category

IFF Grammar

Posted by kelvSYC on 7-21-2014

Hot on the heels of the SimCity 2000 grammar, here’s a quick Generic IFF grammar.  This grammar is made so that anyone can extend it to create their own grammar for file formats based on IFF.  (It is possible to use it on RIFF and AIFF-based grammars as well, but I plan on having more specialized grammars for those.)

The IFF grammar defines a few simple data structures.

  • The Base IFF Chunk is the abstract base structure for all chunks, including builtin chunk types FORM, LIST, PROP, and CAT.  To create your own, simply subclass this one, override the “Type ID” by inserting a fixed value, and fill in the contents of the internal Chunk Data structure.
  • The FORM Chunk, CAT Chunk, LIST Chunk, and PROP Chunk are all abstract base structures for their specific builtin chunk types.  Subclass them and modify their chunk data where necessary.  However, do not delete the Form Type or Contents Type fields in the Chunk Data, as these are part of the standard.
  • The FORM/LIST/CAT structure matches only FORM, LIST, and CAT Chunks.  You may replace them if necessary with something more specific.
  • Since FORM Chunk contents may be any user defined chunk type or FORM/LIST/CAT, consider subclassing FORM Chunk Contents for your needs.
  • The Properties substructure of the PROP Chunk Data is meant to hold all and only user-defined chunks.

The intent of the generic IFF Grammar is that it should be able to parse in its entirety any generic IFF documents, with a specific focus on FORM, CAT, LIST, and PROP Chunks (all four of which are reserved).  It does not cover the reserved FOR1-FOR9, LIS1-LIS9, and CAT1-CAT9 chunks, nor any commonly found chunk data types.

Limitations:

  • This grammar does not check that FORM type IDs are not all lowercase letters and is not punctuation-free.
  • The structure references are better represented as a script so as to better do parsing for embedded structures and the like, but script support is still a bit iffy at the present type.
  • Be aware that some past versions do not handle structure inheritance properly, and may not respect overrides on fixed values or the deletion of members in a subclass.  If this is an issue, feel free to extend the Base IFF Chunk instead.
  • There is no support for the “four spaces” chunk type.

Changelist after the break.

Download the IFF Grammar here!

Read the rest of this entry »

Advertisements

Posted in Synalyze It! | Leave a Comment »

SimCity 2000 Saved Cities

Posted by kelvSYC on 7-20-2014

Here’s a break from the constant madness that is ROM hacking: the SimCity 2000 Saved Game file format.

SimCity 2000 largely follows the Interchange File Format standard, with the notable exception of the requirement that chunks be aligned on two-byte boundaries.  Other notes:

  • Most of the data within the SimCity 2000 save file is compressed using a form of run-length encoding.  The RLE structures are mapped out, but I’ve not really implemented the script elements that will map them.  This is because scripting is buggy and is a huge performance hit on the build (it’s a prerelease build that fixes a few bugs from the last official build, as explained in an earlier post).  I’ve been told that the next release will fix this issue, but I don’t have even that build yet.
  • Synalyze It! can’t really parse decompressed data while the data is compressed.  The true meat-and-potatoes of the SimCity 2000 data is in fact compressed.
  • CNAM chunks may occur at the end of a file, but I have not seen any save that has that.
  • The string in the CNAM chunk appears to be “dirty Pascal string”.  Perhaps putting it as a C-string starting at the second byte might be better.

Hopefully I can get a generic IFF grammar going, as well as its cousin the RIFF, based on this.

Changelists after the break.

Download the SimCity 2000 Saved City Grammar here!

Read the rest of this entry »

Posted in Synalyze It! | Leave a Comment »

The Problem With Offset Arrays

Posted by kelvSYC on 6-13-2014

The offset primitive type in Synalyze It! is meant to be a pointer.  An offset field is basically an enhanced integer field, where the parsed integer, along with some context info (“Relative To”, “Additional”, and, of course, the referenced structure type), additionally renders a structure where the parsed integer refers to.  This is fine and all, but the offset model breaks down in many ways, of which I will talk about one here.

Just like any other data type, you can assign a repeat count to an offset to create a fixed-size array.  The problem is that the results tree simply doesn’t look right.  There are two things at play:

  • The referenced structures are inserted into the results tree after the rest of the structure is parsed.  This is extremely inconvenient, as one would expect that the structure would be right next to where the integer wold lie.
  • The referenced structure does not carry the array index of the offset array.  This means that in an array of offsets, all of the structures would appear to have an array index of 0.  Needless to say, finding the structure for a corresponding offset in your array is painful if the array is large.

Of course, a single script element can be used to actually solve both issues, and has the additional benefit of being able to compute the actual location with greater granularity than what the “Additional” field can provide (the tradeoff is that duplicating the “Relative to” field functionality by traversing the results tree is that much more difficult).  The overall idea in your script element, which will render the entire array, is to, on each iteration, parse the integer (via StructureMapper::getCurrentByteView()) and add it to your results tree (StructureMapper::getCurrentResults()), and then subsequently map the structure based on the value of the integer (StructureMapper::mapStructureAtPosition()).  Both Results::addElement() and StructureMapper::mapStructureAtPosition() take in the “iteration number”, which acts as the array index.  There are a couple of downsides to this, however:

  • It presumes that the structure is of fixed size.  Unfortunately, mapStructureAtPosition() takes in a “maximum size”, which is equivalent to giving a structure a fixed size and assuming that any data that’s left after rendering the structure is padding.
  • This is not reusable.  You will have to duplicate this code (and make subtle tweaks for array size, offset location, etc.) every time you need it.

All in all, quite a bit of effort to attempt to re-render a tree just because you don’t like where the referenced structures are located in the results tree.  More trouble than it’s worth, but it just seems like the way it is currently is a bug.

Posted in Synalyze It! | Leave a Comment »

Little Endian Bitfields

Posted by kelvSYC on 6-13-2014

In Synalyze It!, you can create integer fields of various sizes.  With whole numbers of bytes, you can make this into little-endian or big-endian structures.  This is all fine and good, but because of the fact that Synalyze It! has to process structure members in the order that they are declared, and a structures can have the property that the size of the structures is entirely determined by the contents therein, it means that true bitfields (where multiple integers are being crammed into a whole number of bytes) can only be rendered with Synalyze It! primitives if the bitfield is encoded as a big-endian integer.

This presents a problem: a lot of platforms are little-endian or mixed-endian, for one.  You could somehow get away with it if your fields somehow align neatly with byte boundaries, but this doesn’t always happen.

To get a better idea of what I am speaking of, consider a 16-bit integer acting as a bitfield.  This 16-bit integer consists of a 7-bit integer and a 9-bit integer.  If you try to model it using Synalyze It! primitives, the bitfield will always render it as follows:

xxxxxxx- -------- 7-bit integer
-------x xxxxxxxx 9-bit integer

This is regardless of the endianness of the integer that these two fields have been packed into.  Why is this, given that all structures have an endianness property?  That’s true, but that just refers to the endianness of each individual field, and not an instruction to render the structure with the bytes in reverse order.  (Again, Synalyze It! generally does not know the size of a structure before rendering it, and even if you had a fixed-size structure, Synalyze It! will not reverse the bytes contained within before rendering.)  Thus, bit field parsing only works if the 16-bit integer was big-endian.  If this was a little-endian integer that these two fields are packed into, then we have this in actuality:

-------- xxxxxxx- 7-bit integer
xxxxxxxx -------x 9-bit integer

In other words, the 9-bit integer is now in two pieces, and cannot be modelled by a single field.

Now, there are several ways you can work around this:

  • Coding the bitfield as an integer, and extracting the two fields using scripts.  The pro is that you can do this even without the Pro version: bitmasking will, to some extent, allow you to extract individual fields.  This is generally good enough if every field was one byte, but it also makes it impossible to fix some fields while leaving others unfixed, as, after all, Synalyze It! still considers your bitfield as a whole and never each field individually.
  • Model any field that spans multiple bytes as separate integers, and extracting the true value using scripts.  In order to create a script element that extracts the value, you would need to traverse the results tree.  This is generally feasible via Results::getResultsByName().  Then you can extract the values, and manipulate them via Lua or Python’s regular integer tools, and then insert a value into the results tree to represent the actual value.  The downside to this idea is that you still either have to deal with cutting the results tree to remove the original values (which also removes the ability to alter a value in the results tree and have the changes propagate to the actual file), or have to live with extraneous data polluting your results tree.
  • A custom element.  Custom elements provides the maximum flexibility, in that you have greater freedom to insert exactly what you want in the results tree.  There are two things of note: you are inserting a structure into a tree rather than a single value, and your inserted structure is now read-only: you don’t really have a “structure value” type, so implementing the fillByteRange() function is impossible.

The latter two approaches also means that you would have to custom-make this for every little-endian bitfield you encounter, and their implementations require the use of two techniques that I have found to be useful: zero-length script elements in the second approach, and “manual mapping via prototype” in the third approach.

Zero length script elements are fairly straightforward to implement:

currentElement = currentMapper.getCurrentElement()
currentMapper.addElement(currentElement, 0, 0, value)
return 0

The custom element approach is something entirely different.  First, you must create a top-level structure that will act as your prototype.  This structure won’t actually appear anywhere else in your grammar, but you can set up field sizes, element names, and such.  After creating this prototype structure, then, within parseByteRange(), you can then refer to your prototype structure via

currentGrammar = element.getEnclosingStructure().getGrammar()
prototype = currentGrammar.getStructureByName("Prototype")

From there, you can then use the given ByteView (byteView) to extract the necessary data, prototype.getElementByName() to retrieve the Elements corresponding to your fields, and simply add to the results tree: results.addStructureStart() takes in your prototype structure and results.addElementBits() takes your field Elements and Values.

That’s still quite a lot to do in order to properly render a structure that’s packed into a little-endian integer.  What’s still worse is that the custom element approach will still have a tendency to misrepresent your fields’ actual location in the hex field (the 7-bit integer, despite being solely taken from the second byte, will appear to be coming from the first byte).

In short, none of the solutions will give you the ability to render all and only the fields that you want without sacrificing generalizability, mutability, improper representation of rendering in the hex view, or script-free-ness.  See which approach works for you.

Posted in Synalyze It! | Leave a Comment »