By Taiwanese law, TSMC isn’t allowed to move cutting edge processes to its US plant. The overseas operations have to be at least one gen behind.
From a strategic point of view, it makes sense for the Taiwan government to do this. They don’t want the US to suck them dry then cut a deal with the mainland.
No AI org of any significant size will ever disclose its full training set, and it’s foolish to expect such a standard to be met. There is just too much liability. No matter how clean your data collection procedure is, there’s no way to guarantee the data set with billions of samples won’t contain at least one thing a lawyer could zero in on and drag you into a lawsuit over.
What Deepseek did, which was full disclosure of methods in a scientific paper, release of weights under MIT license, and release of some auxiliary code, is as much as one can expect.