This article has been written for users of OpenPop.NET, but can be read by everyone wanting to know about emails.
The goal of this article is help users understand:
- how emails are structured
- get a better understanding of why OpenPop.NET is built as it is
Lets get right to it!
An email is structured into a header part and a body part. The two parts are delimited by an empty line.
The header part contains all the headers of the message while the body part contains the actual content of the message.
Simple email example
A simple email might look like this:
From: foo@bar.net
To: bar@foo.net
Subject: Hello bar from foo
Hello bar! Did you notice this email?
Here the first three lines (colored
blue) are part of the header, while the last line (colored
green) is the body part of the message.
The headers tells us that the email is sent from
foo@bar.net
, and that the receiver is
bar@foo.net
.
It also tells us that the subject of the email is
Hello bar from foo
The content of the message is found in line four, and this content is simply some text that is meant to be read by bar@foo.net.
The format itself is described in
RFC 822.
Early limitations
The RFC 822 specifications had some severe limitations which led to a new standard being proposed.
- It can only contain ASCII characters and for most people, this is not enough.
- It is not possible to attach anything to the message like images, documents or the like.
Because of these limitations, a Multipurpose Internet Mail Extensions, MIME, standard was proposed.
MIME Introduction
We all know that it is possible to attach a file to an email, but as just described, RFC 822 does not allow this.
To address the problem, an email could be structured into multiple parts.
One part could be used to contain the text for the recipient, while another part could be used for an attached file.
Some parts could even be available in several formats. It is customary to have a plaintext and HTML version of the text sent to the recipient.
These are then held in different parts.
So far we have seen an email as containing a header part, and a body part. MIME changes this by structuring an email into multiple parts as just described.
Let us see an example of a structured email. The example below contains a HTML part and a part containing an attached MP3 file.
From: foo@bar.net
To: bar@foo.net
Subject: My new MP3!
Content-Type: multipart/mixed; boundary="111magic-boundary-string"
111magic-boundary-string
Content-Type: text/html
Hello again bar! Here is my new super crazy MP3 file for you!
111magic-boundary-string
Content-Type: audio/mpeg;
Content-Disposition: attachment; filename=new.mp3
Content-Transfer-Encoding: base64
VG9ueQ==
--111magic-boundary-string
You may have noticed that this email is more complex.
The Content-Type specifies what type the content is. In this example, the Content-Type is
multipart/mixed
.
This MIME type tells us that the first part, the main body of the email, actually contains multiple parts.
Each part in the body is separated by a special boundary delimiter, which you are also able to see in the Content-Type header.
Here the boundary is
111magic-boundary-string
, and has been colored blue.
The parts that are between the boundary string are the parts inside the body. (colored
green)
You might have noticed, that the last boundary string has two hyphens (--) in front of it.
This is to tell the MIME parser that this is the very last time the boundary string will be mentioned - there are no more parts after this string.
Let us take a look at the first part:
Content-Type: text/html
Hello again bar! Here is my new super crazy MP3 file for you!
If you look closely, this is structured exactly like the simple email was. It has a header and a body part.
This part could as well had been a
multipart/mixed
part, in which case it also had more parts inside it.
This part, however, does not include multiple parts inside it. The Content-Type header tells us that the MIME part contains text, and that the text is actual HTML text.
The second and last part of the email is the part that holds the attached MP3 file:
Content-Type: audio/mpeg;
Content-Disposition: attachment; filename=new.mp3
Content-Transfer-Encoding: base64
VG9ueQ==
This part has a Content-Type of audio/mpeg, which specifies the content inside is a MP3 file.
It also has a Content-Disposition header, which tells that the part is an attachment, and that the original filename was
new.mp3
.
The content of the part is
VG9ueQ==
which looks a bit strange.
If you look at the Content-Transfer-Encoding header, it specifies that base64 has been used to encode the data.
So to get the actual MP3 content, we need to decode the body with a base64 decoder.
The base64 data here is just a sample - it actually reads
Tony
if you decode it.
Emails as trees
You have just seen an example of an email containing multiple parts.
- The main body was a multipart/mixed part, which contains parts inside.
- The first of these parts was a text/html part.
- The second was a audio/mpeg part.
We can represent this by a typical computer science tree:
Figure 1.
This is how most of emails will look like.
As hinted earlier, a part could itself specify that it includes more parts.
A tree for such an example could look like:
Figure 2.
Let us interpret this email.
- The main body is a multipart/mixed part, which contains parts inside.
- The first part of the main body part is a multipart/alternative part. It specifies that it includes more parts inside, and that these are alternatives of each other.
- The first part of the multipart/alternative part, is a text/plain part. It contains plain text.
- The second part of the multipart/alternative part, is a text/html part. It contains HTML.
- The second part of the main body is a audio/mpeg part. It contains MP3 data.
- The third part of the main body is an image/png part. It contains an image.
- The fourth part of the main body is an image/png part. It contains another image.
This is a rather common email structure.
Typically a user enters content into an email, which might contain formatted text.
The formatted text will nearly always be sent as HTML.
Since some email clients cannot cope with HTML, most email composers will include a plaintext part of the same message.
This part will not have any formatting like bold or italic text.
The images attached could be referred to from the HTML part, and therefore be shown to the user instead of being treated as an attachment.
Messages with non-ASCII characters
We have now seen messages with attached files.
The other limitation of RFC 822 messages was that only ASCII characters could be included.
There are two issues we need to consider when we want to fix this problem.
- The email can only contain ASCII characters.
- How to interpret the bytes sent? We need to map bytes to characters using some character map.
The first issue has a simple fix. We encode the bytes sent into some format which only outputs ASCII characters.
Two such encodings are defined:
It is out of the scope of this article to describe these encodings, but they get the job done.
We specify the encoding used in the Content-Transfer-Encoding header.
The last issue was how to map bytes to characters. This is simple, we use some character encoding.
We give the character encoding a name, an then specifies it in the Content-Type header using charset=<character encoding>.
Here is an example:
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
This is some special danish text: =E6=F8=E5=C6=D8=C5
Here we see that the message is using ISO-8859-1 as the character set and quoted-printable to encode the data to ASCII.
The text says:
This is some special danish text: æøåÆØÅ
.
MIME is complex
There is a lot more to MIME then what has been introduced here.
Here are some examples:
- Line-length in emails is limited. Therefore headers can be split across several lines.
- Some headers needs to include non-ASCII characters. Several inline encodings are used for this.
- When a header like Content-Disposition describes a long filename, normal header splitting cannot be used, a special encoding is used in such cases.
If you really want to understand MIME emails, you will have to read the RFCs yourself.
How MIME structure relates to OpenPop.NET
You have now seen that emails are complex and that the structure of an email is not a flat header and body part.
Each part itself can have multiple parts inside.
OpenPop.NET understands the hierarchy and saves it using a
Message and a
MessagePart class.
The full message is represented as a Message object, while each part of the message is represented using a MessagePart object.
Consider again Figure 1:
As just described, the different parts of the message will be mapped to objects. Here is the same picture where the used classes are mentioned:
Figure 3
The root Message object holds the main headers and a reference to the main body part of the message.
This reference is hold in the
MessagePart property.
When you have a multipart MessagePart, each contained parts is available using the
MessageParts property of the MessagePart object.
When a MessagePart is not a multipart, like text/html or image/png, then you can get the raw bytes using the
Body property.
In the case of text/html, you want the string instead, which is available using the
GetBodyAsText() method.
Most often with attachments, you will simply need to save the bytes to a file. You can use the
Save method.
Now you might question yourself, if you really need to visit the entire email hierarchy to find a MessagePart whose contents is text/plain or text/html.
The answer is yes, but there are some helper methods in the Message class you can use for this.
See the
documentation of the Message class for these. You can also take a look at
this example.
You should now have a better understanding of what emails are and how they are structured.
You should also have a better understanding on why OpenPop.NET is built as it is.