Given fs, every sample point is in the grid of 1/fs second. Then, exactly how many sample points are there in the expression

x(t1~t2), which is the audio signal x between t1 ms and t2 ms?

If t1 is exactly on the time grid, the first index is the one corresponding to t1.

If t2 is exactly on the time grid, the last index is the one corresponding to t2 minus 1.

(Why minus one? Consider a signal for 1 second with fs of 44100 Hz, the the signal length should be 44100, not 44101)

Then, what if t1 or t2 is not exactly on the grid?

Should the total number of samples be ceil (fs * (t2-t1)) or fix (fs * (t2-t1)) ? Or even round (fs * (t2-t1)) ?

Consider the following example: (fs = 22050 Hz)

Sample index timepoint

1st 0.00000

2nd 0.000045 1 * fs

3rd 0.000091 2 * fs

..........

..........

..........

221st 0.009977 220 * fs

222nd 0.010023 221 * fs

223rd 0.010068 222 * fs

Now we know,

x(0~10) should be from index 1 through index 221 --> 221 samples

x(0.03~10.03) should be from index 2 through index 222 --> 221 samples

But,

x(0.05~10.05) should be from index 3 through index 222 --> 220 samples

Therefore, the answer to the above question, how many samples...

depends on the beginning time point. It's neither ceil, fix, nor round).

This means that you shouldn't assume that

the length of x(t1~t1+10) is the same as noise(10), for example.

The length of noise(10) is either the same as that of x(t1~t1+10) or plus one, because noise(10) always begins at zero, i.e., the last point may come into the grid that might be out of the grid if the beginning point was not zero.