Prevent patents by allowing crawlers - Administrivia - Ethereum Research


    Ethereum Research
  

      Prevent patents by allowing crawlers
    

                Administrivia
              
            
                DennisPeterson
                
              
                    November 17, 2018,  2:41pm
                  
                  
1
              
            
              One way to help protect against software patents is to make sure posts are stored in Internet Archive, thus giving a reliable publish date. I just attempted to do that with a post, and it didn’t work because they won’t save anything that has a robot.txt prohibiting web crawlers.

Would it be possible to modify robot.txt?

            
2 Likes
            

                dlubarov
                
              
                    November 17, 2018,  8:40pm
                  
                  
2
              
            
              The robots.txt looks fine to me; IA’s crawler should be able to discover and archive any topic pages it likes. It looks like IA’s crawler just hasn’t decided to archive very many topic pages, for whatever reason, but there are some. Here’s an example.

If someone representing the website could email info@archive.org, maybe they could adjust some configuration to make their crawler more likely to archive all the topics here.

Edit: I tried requesting that IA archive a topic page through their web UI, and IA did archive it (link), but the server didn’t give it the actual content of the topic; instead it returned “Oops! That page doesn’t exist or is private.” Might be a bug in Discourse? Or it could be some intentional bot blocking code within Discourse, possibly with a rate limit that IA’s crawler sometimes exceeds.

            
2 Likes
            

                DennisPeterson
                
              
                    November 17, 2018, 10:05pm
                  
                  
3
              
            
              Interesting. On one request I got a message about robots.txt but on several other attempts I got the same message you did.

            
1 Like
            

                DZack
                
              
                    December 20, 2018,  5:47pm
                  
                  
4
              
            
              I can think of another place to store posts for future “proof of publish date”

(or hashes of posts, anyway)


                DZack
                
              
                    December 21, 2018,  8:01pm
                  
                  
5
              
            
              …but actually tho, if we can just get posts in a standard/ plaintext format, say once a week, hashing them, storing the hash on Eth, and hosting the content (IPFS, or even just have a few redundant copies hosted somewhere) could be a neat project, and a nice illustration of an easy use-case.

            
1 Like
            

                virgil
                
              
                    December 21, 2018, 10:06pm
                  
                  
6
              
            
              I will ask my colleagues at archive.org to look at this.

            
2 Likes
            

          Home 
        
      
          Categories 
        
      
          FAQ/Guidelines 
        
      
            Terms of Service 
          
        
            Privacy Policy 
          
        
  Powered by Discourse, best viewed with JavaScript enabled