extremely nice finding in my real-time analytics this morning:
Thanks, Google! You made my day 🙂
How big the web is? Yes, we all know there are some really big fat numbers here, but how big? Google unveils some stats from time to time, and that gives us the picture.
You see the pattern: in the last 14 years the web (or what Google extracts from the web) became really, really big thing. Note the order of magnitude: for 10 years (from 1998, when Google started with 25 million pages to 2008, when they celebrated their 1st trillion) the web grown 40,000 times. And even if that sounds big, it’s nothing compared to the next part of the chart. For the next 4 years, we got something I would call exponential growth: from 1 to 30 trillion webpages. This is over 7 trillion pages growth per year. The web of today is 1.2 million times bigger than 1998.
Let’s translate the 30 trillion number into something meaningful: you will need over 500 terabytes of storage to save the URLs only. Give it one backup copy, and it’s a petabyte. 500 2TB hard drives. Just for the list of the pages. Don’t ask how much more you need to keep the actual content 🙂
There are, currently, about 4300 pages for each human being on this planet. Yes — if every single human alive reads 4300 pages right now, we’ll read the whole web. Go figure…
Another statistic from the big G: they served something like 1,2 trillion searches in 2012. About 100 billion searches per month. Multiply by 10 (the default number of results). Imagine that there are no repetitions and Google returns different URL each and every time. Even then – 2/3 of the URLs indexed will never be shown to any searcher. Now you see how powerful the holy algorithm is. Even if we restrict it to show each page only once, it will trash 2/3 of all webpages it knows. Does SEO matter?
But there is another number: in 2012 there are 246 million domains registered (according to Verisign). almost half of them (about 120 million) are .com and .net domains. About 13% of all .com and .net the domains doesn’t contain any web page. 21% contain only one and 66% are multi-page domains. To make things simpler — let’s assume it’s the same for all TLDs. This will give us some funny numbers. There are 25 million domain names without any web page published. There are about 50 million single-page domains. So, almost all of these 30 trillion pages are published on just over 160 million domains. Again: 160 million domains are serving close to 30 trillion web pages. 66% of the domain names in use are responsible for 99,99% of the web. That gives us something like 180,000 pages average for each of these multi-page domains. So, is your site close to the average?
But don’t panic! Think about wikipedia, or blogger, or wordpress, or youtube. Wikipedia alone is responsible for almost 30 million of these pages (to put it in context: wikipedia alone is bigger than the whole web back in 1998). And these multi-million page sites are human built. Imagine what the bots can do. So, don’t feel bad if your site doesn’t have hundreds of thousands of pages 😉 But you may start to think about automation…
I get so many questions in my head reading these numbers, but I will stop here. This post was supposed to be here only for fun. Still — I can’t stop myself from asking at least two of these questions. How will Google handle the growth in the coming years? I expect something between linear and exponential growth, thus meaning we will see the number doubled in year or two… And the more important one: is there a limit of some kind? I have no idea how big the web could possibly be in a decade from now, but comparing to the past, it sounds really interesting… Can the web grow forever? Is even the sky a limit?
Well, we’ll see…
But I will stop here and go back to work now, because with just few hundreds pages this site is far, far away from the 180,000 average 🙂
Sometimes I am lazy about the code quality of my own pages. If all the browsers are showing the page as expected and visitors are able to get my site content, I can live with some “not-valid” piece of code. But it’s me. When working for clients, I am always checking the validity of the code (and the site ata all). Everything must pass the standard validation tests. And speaking of standards, here is the list of the 5 test tools I am using (almost) every day:
This is the validator. If you never checked your site’s code against this validator — go and do it. It is not just a validation service, it’s a learning tool too. After fixing the same type of error several times, I tend to avoid the wrong code in future. In brief — this (and all) W3 Validators are the industry standard for testing against the industry standards 😉
I am using a bunch of testing tools on a daily basis. In the comming days I’ll publish several lists with my favorite test tools, starting today with my list of speed testing utilities.
These are “real-life” testing utilities. In general, you enter the URL, select among servers/locations, click a button and get the report (after some time of waiting for the actual test to be performed). The advantage of these tools is that you can check the load speed of your site from different locations (different servers in different countries).
From time to time I receive sad/angry/informative messages from users that some antivirus scanner is reporting my programs as virus or trojan or suspicious file. Obviously those are “false positives” (the AV vendors are using this euphemism to replace “lies”). I have no idea why that happens. In the past I was publishing unsigned setup files, which could be part of the problem. But now everything is signed with my digital signature. The only reason for this to happen is the paranoid nature of some antivirus programs. Yesterday I had to deal with disappointed user who purchased my program just to see Nor**n saying it’s a virus and to stop the tool from working. I refunded the money, but no one could remove the bad taste feeling can’t be cured neither for me nor the user. According to nor**n, this tool is “suspicious” because is not very popular. Boy! I am writing niche software, I never intended to be popular with these tools. DrW*b directly says (lies) there is a virus in other of the tools I am publishing. Why? What virus? Kas***sky isn’t right now reporting any of my tools as virus, but it did that several times in the past… What is this? Why are antivirus programs lying about my programs?
In the last 10 years I am dealing a lot with download sites. Each title must be submitted to hundreds and even thousands sites. sometimes they are accepting my submissions, sometimes they are rejecting, but the people behind one site managed to surprise me. I never expected someone to contact me via email and ask for additional info. That means that someone is actually careful whit this site. They are checking what to list, contacting the developer. And this is the link if you are curious which one this site is: LifetimeUpgrades.com
You changed my life.
And you’ll never be forgotten.
You asked, I did it: the Duplicates Finder tool now comes with “ignore case” option:
Yes! 4 months later, the new version of Screen Ruler is ready and published. A lot of good news (download and try in order to see all the changes) and one not-so-good piece of news (it looks like it is not free anymore)… Here is the download page: /screen-ruler-pro/
Direct download links below (fully-functional free trial version):
MS Windows XP, Vista, 7 (4.9MB; 32bit):
Mac OS X 10.5+ / Intel (13.6MB; 64bit):
One old goal is now fact: I got the under 100K Alexa rating, eventually, not much sure what that means, but it should be noted… (I am not sure why I was waiting for this in the last 3 years, probably something in Alexa gave me this number as a “goal”, now, when I saw this for first time, I felt I must be happy, although I don’t even remember why ;))As you probably already noted, I was fixing the site in the last days. Broken links, bad behavior in small resolutions and so on. However, I think it is fixed now and the new look should be labelled as “ready” with the start of the new month… New month, new look, new luck ;-)Now, I am on my road to get several important decisions about the close future. Which tool must get the priority? the css menu generator? The shaker? The new sitemap generator or the good old screen ruler? … There are some important (for me) considerations: should I continue with the current codebase? Should I start with MS Visual Studio? Or why not Qt? Am I planning cross-platform versions? Should I “commercialize” the site a little (in order to make it possible to move it to dedicated hosting and purchase some goods that will make the whole project better)… Well, I am pretty sure all these are not important for you, but I am writing in this blog not only for the visitors — as it was said once: verba volant, scripta manent, so — for my own future reference — let it be written…Ok, that’s all for now. Let us hope we will have more good news very soon.