Generating Mail Addresses in AutoFixture

20 April 2015

I’m wrapping up stage 1 of some modifications to AutoFixture that slightly modify new logic it has for generating a MailAddress. In case you’re not familiar, MailAddress is the standard framework type for representing email addresses. A prior pull request to AutoFixture added a new builder that produces these mail addresses, and essentially it did so this way:


	var name = Guid.NewGuid().ToString();
	var domain = this.fictitiousDomains[(uint)name.GetHashCode() %3];
	
	var email = string.Format(CultureInfo.InvariantCulture, "{0} <{0}@{1}>", name, domain);
    return new MailAddress(email);
	

Basically, the MailAddressGenerator would create by hand two strings, one to represent the part before the ‘@’ and one for the domain. In order to ensure created email addresses couldn’t slip out into the wild, the ficitiousDomains is simply an instance collection of safe domains (e.g. example.com, example.org, example.net). However, in AutoFixture it is typically not good form to produce additional specimens when generating a certain type. Instead, that should be deferred to the ISpecimenContext parameter. In other words, something like the below would be preferable:


	var name = context.Create<SomeTypeRepresentingEmailName>();
	var domain = context.Create<SomeTypeRepresentingEmailDomain>();

The upsides to this approach:

  • Deferring creation of those specimens back to the builder chain allows you to better customize those individual pieces
  • It is easier to reason about MailAddressGenerator and its logic

Here’s a key question: since ultimately we need the string representation of our specimens, why don’t we just ask the ISpecimenContext to resolve them as strings within MailAddressGenerator? That would definitely work, but the risk is: suppose someone has customized string generation in AutoFixture to produce strings that result in invalid MailAddresses? For that reason, we opted to use a signal type to represent the specimen we needed.

First, in terms of names, an email address is made up of two parts: the local part and the domain, separated by the ‘@’. Any cursory web search will reveal that a number of RFCs touch on the format of email addresses, but they roughly be described as: between 1 and 64 characters in length; specific list of allowable special characters; only US ASCII characters. That last one is fairly interesting. Here are some local parts that Mark offered that presumably he’s seen in the wild, but are prohibited per the RFC:


	[Theory]
    [InlineData("ndøh")]
    [InlineData("ndöh")]
    [InlineData("åhnej")]
    [InlineData("ñoñó1234")]
    public void LocalParts_AreInvalid(string localPart)
    {
        var pattern = @"^(?!\.)(""([^""\r\\]|\\[""\r\\])*""|"
                                        + @"([-A-Za-z0-9!#$%&'*+/=?^_`{|}~]|(?<!\.)\.)*)(?<!\.)$";
            
        var invalid = System.Text.RegularExpressions.Regex.IsMatch(localPart, pattern);
        Assert.False(invalid);    
	}
	

How could that be? How could email addresses be in use that are invalid per the RFC? Well therein lies the rub. The RFC says one thing, but email vendors can (and do) allow addresses to be used that violate these rules and that is their prerogative. And in some ways that’s good because the RFC is pretty restrictive as-written with basically no support for non-US characters. That is why the typical guidance you here regarding validating email addresses is: don’t. Don’t bother. Perhaps you do some basic syntactic validation or maybe run some regular expressions to ensure there’s nothing suspicious, but otherwise the only true way to know if an email address is valid or not is to send an email message to it.

That pretty quickly put me into an interesting spot. I pitched softening the rules a bit (such as to support non-US ASCII characters, but otherwise keep the rules as-is), at which point Mark guided me to a different conclusion. How was AutoFixture planning to use the EmailAddressLocalPart specimen? As shown, we plan to request one from the ISpecimenContext, combine it with a domain, and then construct a MailAddress from it. So for our purposes, the thing that makes an EmailAddressLocalPart valid is simply that MailAddress is able to accept it. MailAddress already has a lot of validation rules that it applies in the constructor, unfortunately these are done by a MailParser class, which is internal within the System assembly so we cannot directly access it (or at least not easily). And while we could reverse its rules, we’d be forever subject to the whims of bug fixes and modifications to that logic where MailAddress is concerned. For that reason, EmailAddressLocalPart has almost no validation in it (other than ensuring the local part constructor parameter not null and not empty) and MailAddressGenerator becomes:


	try 
	{

		var localPart = context.Resolve(typeof(EmailAddressLocalPart)) as EmailAddressLocalPart;
		
		if (localPart == null)
		{
			return new NoSpecimen(request);
		}
		
		var domain = this.fictitiousDomains[(uint)name.GetHashCode() %3];
		
		var email = string.Format(CultureInfo.InvariantCulture, "{0} <{0}@{1}>", localPart, domain);	
		
		return new MailAddress(email);
	}
	catch (ArgumentException)
	{
		return new NoSpecimen(request);
	}
	catch (FormatException)
	{
		return new NoSpecimen(request);
	}
	

Since we’re asking the context for an EmailAddressLocalPart, it needs to able to resolve it so we also have an EmailAddressLocalPartGenerator, which resolves a string from the context and produces EmailAddressLocalPart from it.

What If I've Modified AutoFixture's String Generation?

Let’s say you’re the person that has modified AutoFixture’s behavior where string generation is concerned and you’re also trying to use its behavior for MailAddresses and it’s breaking - what should you do? In case you’re not familiar with AutoFixture’s design, one can separate the builders it applies into two processes. First, it has a collection of builders that it attempts to use to satisfy type requests. These builders are in the Customizations property on the Fixture instance itself. When AutoFixture gets a request to create a type, it first checks to see if any of these builders can satisfy it and if they do, it uses those values. If none of the custom builders can handle the request, AutoFixture next uses its Engine to resolve the types. The Engine is a separate collection of builders and these are what you could consider the core AutoFixture builders that most everyone relies on without thinking about it. The generators I discussed above, MailAddressGenerator and EmailAddressLocalPartGenerator, are both hooked into the Engine automatically so are always available. But you as the user are free to define your own builder that will resolve MailAddress or EmailAddressLocalPart however you see fit to work around a case when string generation causes mail address validation failures. This seems fairly unlikely, but given the goal of AutoFixture being widely applicable to many situations, it is set up to support these needs, albeit you take on a bit more of the burden yourself.


The Right Design Makes Your Job Easier, Not Harder

21 March 2015

In the last post, we examined the decorator, chain of responsibility, and composite patterns against a particular refactoring. As proof that you learn more from teaching than you ever do as a student, in teaching one my colleagues yesterday on this material, I had a bit of a realization. I’m sure others have had it before me, but it really struck me so I feel compelled to capture those thoughts.

After we ended up with implementations of the same interface, but in each of the three patterns, I concluded by saying the composite seemed an improper choice but picking between the chain and decorator was largely preferential. That turns out to be wrong. In the presented situation, the decorator is clearly the better choice than the chain and this post will explain why.

From the original problem, we need to retrieve a Registration from the database, update one field on it based on calling a CRM service, and then return the single, complete Registration. Therein lies the key: the bulk of the data is coming from the database. We are adding to it exactly 1 field. Which pattern seems to fit that more closely?

In our decorator implementation, our inner component returns its Registration from the database. Our decorator first calls the inner component, then simply updates the single field it cares about to the value it pulls from the CRM. In the chain implementation, we alread have to do things a little different than your typical chain. Instead of performing our processing and then calling the next component, we call next first, then add on top of it. This is necessary because our interface does not take a Registration as a parameter so the only way to access the other component’s instance is as a return value. This led us to a case where we created a third component to seed the chain by simply creating a new Registration to return. But let’s think about that for a minute.

If GetRegistrationFromDatabase receives a blank Registration from its linked component, we have two options. First, after we retrieve the Registration from the database, we can write a method to copy over the properties one by one (presumably using Reflection so it’s more dynamic and lowers chance a field is missed if it’s added later). But that’s a lot of code we didn’t have to write before. Thus option 2 is to ignore the value we’re given and just return the Registration pulled from the database. But if we do that, we no longer need the seed component at all. Once it’s removed, our GetRegistrationFromDatabase component is now essentially the terminal end of the chain: it starts the ball rolling since we want to use its Registration and build on top of it. Well that’s essentially a decorator under another name!

So that leads me to my realization and the point of this post. When we arrived at the correct design pattern, the work we had to do was very easy. In fact, the first implementation we started with where we simply put our CRM service call into the original method is essentially identical to our decorator, the logic is just split up across the two components. But when we picked the composite and the chain, we instantly had to grapple with a new issue (how do we get Registration into the chain) and code we didn’t have to write previously (copying/merging Registrations together). As soon as we find the problem getting harder instead of easier during a refactoring, we should presume our design or pattern selection may be faulty and we should reconsider.

Sometimes I don’t quite know the best way to take a design for a problem and even tests don’t help me dislodge the rut. I don’t know where I read this, but in those cases I will simply pick a pattern and try to solve the problem using it. In pretty short order I can see if things are getting better or worse and use that as feedback to inform my design. I posit that no one really solves hard problems anyway. Instead, we break it into smaller, easier problems and solve those instead. The ability to decompose such a problem requires a lot of skill and experience, but fundamentally that’s the nature of a developer’s job. Patterns are a great tool to have as it builds on the experience of those before us (most or all of whom were far smarter and more skilled than I), but only if you pick the right one. Use the problem itself in comparison to the pattern as a guide. If it’s not getting easier, either break it down more or consider whether you’ve picked the right one.