As far as I know, it takes place at the Network Layer, which in turn knows about the MTU of Data Link layer. It has to be this way, because if you fragment it on layer 2, then Frame fragments would have to be buffered on Switches memory. Then reconstructed to form full packets again, only then a router will be able to understand it and route it. On the other hand, if you fragment at Network Layer (which is actually whats happening), the router will be able to route the fragments. They only have to be reconstructed to Transport Layer Segments when they finally arrive at the destination.