Jump to content
Curious Cosmos
Razimus

John Titor Vs. Morey Haber (Text Comparison)

Recommended Posts

 

It's not about one match, or the chances of one match, it's about the complete number of matches as a whole, and the chances of the complete number of matches as a whole.

 

If someone can find better evidence, I will happily accept it, I have yet to find better evidence than this.

Share this post


Link to post
Share on other sites

Even so, when speaking on a relative subject, we'll say computer networking, some words tend to be used as a basic adjective even if other words were available; e.g. Infrastructure. Most of the connections are between the phrases that the two used, which could be labeled coincidence, but a coincidence is quite unlikely as the speaking styles between them are relative.

 

The main question is: Who is the orchestrator?

Share this post


Link to post
Share on other sites

 

It's not about one match, or the chances of one match, it's about the complete number of matches as a whole, and the chances of the complete number of matches as a whole.

 

If someone can find better evidence, I will happily accept it, I have yet to find better evidence than this.

 

It's not about "if someone can find better evidence." You posted a data set - word and phrase matches. You then attach personal and, in the post at least, unsubstantiated qualifiers on the matches, i.e. "uncommon word match" or "rare context match." They are unsubstantiated because you don't state why they are "uncommon" or "rare", how you determined that are rare, just how rare they are given the topics being discussed by each person (which goes to describing in detail the control and experimental groups), the statistical meaning of the rarity and at the bottom line what statistical method you used to decide that the apparent commonalities are significant to the extent that you can justifiably decide that its a match.

 

So, it's not evidence. It's just a data set followed by personal opinions.

 

Run the numbers, do the statistical analysis, post the numbers, post the statistical tests that you used and your results. And when you do the research you must use every written word ever posted by Haber, not just the posts (articles, etc.) that you find interesting. Only using "interesting" articles is called cherry picking and it always results in Type 1 or Type 2 experimental errors because the analysis ends up being based in part on experimenter bias.

 

Don't get me wrong here. You may have come to the correct conclusion. I don't know (and no longer care). What I do care about is experimental design and analysis. You method isn't even wrong because there is no statistical research method here.

Share this post


Link to post
Share on other sites

Barring a complete list of " every word ever written by" and only being able to work with a small sample, it seems to me that the source you are comparing the sample to is what is important.

 

The "compiled reference material" would be the " hallmark" that any sample is tested by.

 

Isnt a list of that caliber usually decided upon by peer review?

 

How to aquire a haber complete" set ", that would be difficult to say the least.

Share this post


Link to post
Share on other sites

 

The "compiled reference material" would be the " hallmark" that any sample is tested by.

 

Isnt a list of that caliber usually decided upon by peer review?

 

How to aquire a haber complete" set ", that would be difficult to say the least.

 

Yes, peer reviewers would decide through their critique whether the data set was adequate. But they wouldn't decide what the experimenter should or should not use. The experimental design and its data set is solely decided upon by the researcher. Their job is to judge the quality of the work, its adequacy and whether the conclusions are justified by information and techniques used. They can (and do) make suggestions about where the design can be improved.

 

True; producing every word ever produced by Haber would be impossible to accomplish. We don't have access to the entire body of his writings. But every word that can be discovered should be included. Cherry picking equals experimental failure.

 

Context also plays a part. It possibly wouldn't be of much use to take as a sampling a technical manual written by Haber and compare that against Titor's informal online posts. Writing a tech manual usually involves the writer's careful choosing his/her words where online posts are generally written on the fly. The specific words, doublets, triplets, etc. used by a writer when producing a tech manual could well have little correlation with same person's informal writing (though that's a question that can only be answered through a seperate experiment :) ).

 

Author identification is a very difficult task. Worse, in the end the most accurate analysis only tends to eliminate candidates rather than unambiguously identifying matches. The Chi Square Test, for example, only tells you the degree of confidence that the result varies from expected randomness.

 

When we originally performed the Chi Square based linguistic analysis back in 2004 we did include every word written by Titor/TTO in his posts and did the same for both the randomly chosen control group as well as the experimental group. We also stated that no group (control or experimental) really provided a sufficiently large body of written material for a true analysis. That usually requires 100,000 to 200,000 words. I checked my old files a few minutes ago and the Titor/TTO concordance only contains ~4,600 words. He really didn't make very many posts. When he did post a large proportion of the text was in the form of direct quotes from other posters. The quotes were eliminated because, obviously, they were not his writing. We didn't include in the Titor concordia the posts submitted by Pamela because we had no way of being sure that he wrote the material that she reposted. That was a judgement call but we believed that it was the correct call.

 

Pamela herself was suspected by a few people of being Titor. Thus we had to only include in both his and her concordance writings that were directly posted under their name through their own TTI and Post-2-Post accounts. Mixing the two together by including materials posted by her with the claim that it was actually written by him would have skewed the results, especially given the small size of the sampling, toward an increased confidence that the results were not random (a possible match) - not because it was a true match but because we would have taken two seperate concordances and combind them into one. Of course they would be similar in that case.

Share this post


Link to post
Share on other sites

Darby, you said so yourself. Pamela was a courier. A postman.

Nobody is going to find Titor, Darby.

Because if you don't know, nobody knows.

Share this post


Link to post
Share on other sites

(I couldnt get the video to load last night)

Now after watching it , I see that in fact it is all cherry picking of results.

 

Although , as has been shown before , that many of jhons responses were repetitive.

That being so, assumption could be made , that maybe the story was meant to be kept within a specific "story guidline" and would also result in the inability of any qualitative means testing, or at least increase the error/hit rate.

Which by the way, couldnt specifically be measured if the "positive hits" actually imcluded errors, or to what rate of error.

 

Having many false positives because the same portion of the story was repeated over and over, word for word.

 

 

As far as a base text to measure from, you did the right thing darby removing Pamelas text from the control group.

 

And hello again.

Share this post


Link to post
Share on other sites
Darby, you said so yourself. Pamela was a courier. A postman.

Nobody is going to find Titor, Darby.

Because if you don't know, nobody knows.

 

I didn't say that I was one of the people who suspected her of being Titor. I never even entertained that thought. Frankly, there's no way in this or any other universe where Pamela could be Titor. That conclusion is based on writing ability, not style. He had it and she didn't.

 

But as a part of the experimental design we were forced by the circumstances to take into account both the suspicions and her actions which resulted in eliminating "his" posts that were put online under her account here at TTI. We were well aware that in so doing we risked a Type 2 statistical error - rejecting valid evidence (false negative). But we did disclose the decision to reject that portion of the writings and why. That left the decision to the consumer (or peer reviewer if you will) as to whether or not we made the correct choice.

 

As to finding Titor, I'm not looking so I won't find him. My comments to (about) Raz's video go only to his experimental design and methodology - or more precisely, his lack of both design and method. His results are of no particular importance to me.

Share this post


Link to post
Share on other sites

I know that you did not suspect Pamela. But you did suspect the identity of the person who gave her the secret song reference. And in that suspicion, you were correct.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...