The "EMI" files format

This document is still being researched and is not yet complete. This means that some information could be erroneous or simply missing.

Table of content:

  1. Introduction
  2. Header
  3. Data
  4. Libraries and tools

Introduction

The EMI files are the containers for all the game's data, apart from the videos. These files are located in the BIN directory on the CD, themselves organized by category in sub-folders.

They are binary files, with a header containing a table of contents of the data they contain, followed by the data in raw format.

Information

The same data can be found identically in several EMI files.

The reason for this is that loading several pieces of data in different places on a disk can be slow. So, if the data to be loaded is grouped together in the same place, loading will be much faster, limiting the number of readings required.

The header structure is composed as follows:

  • The number of data contained in the file,
  • A magic word MATH_TBL,
  • A table of contents of the various data.

Let's take a closer look at the data in this header:

FromToSizeTypeDescription
0x000x034u32Data entry count
0x040x074???? Version ???
0x080x0F8u8[8]Magic word "MATH_TBL"

Just after these 16 bytes, we have the table of contents. This consists of several (same number as the number of data) “lines” indicating:

  • The size of the data,
  • Data parameters (details below),
  • The first 4 bytes of the data,
  • Data type (to be confirmed),

Here are the details for one line in the table of content:

FromToSizeTypeDescription
0x000x034u32Data size
0x040x074u32Pointer in RAM
0x080x0B4u8[4]First 4 bytes
0x0C0x0D2u16Data type ?
0x0E0x0F2?Garbage data

A list of known data types:

Type IDDescription
3Image
6Header samples audio (PSX/VH)
7Data samples audio (PSX/VB)
10Music Sequencer/Midi (PSX/SEQ)

A lot of data has type 0, but it's varied data that doesn't correspond to a particular type.

Data

Note that in EMI files, the start of the data is always on a multiple of 2048 (0x800).

For example:

  • The start of the first data in EMI files will always be at file position 2048 (0x800),
  • The position of the second data will depend on:
    • where the previous data begins,
    • the size of the previous data,
    • finding the next multiple of 2048 in relation to the previous data.

If we were to write it in code, it would look like this (In C++, the binary shift loses data):

unsigned int next_data_position = current_data_position + (current_data_size + 0x7FF >> 0xB) * 0x800;

/*
For example, my first data starts at 0x800 and has a size of 2100 bytes
To find the next data, I will do:
- First let's understand the complex part: "current_data_size + 0x7FF >> 0xB" -> "(2100 + 2047) >> 11" -> "4147 >> 11" -> "2".
  This is the times we have to multiple 2048 from the current data to find the next one.
  
- Let's continue: "current_data_position + (current_data_size + 0x7FF >> 0xB) * 0x800" -> "2048 + 2 * 2048" -> 6144 (0x1800).
  This is the actual position of the next data in the file
*/

Between two data, you will see mostly garbage bytes like a series of 2E or 5F.

Libraries and tools

You can use the Emi Extractor tool made by Navarchos/Red Herring to extract the data from your EMI files.