You may at some point need to remove a URL from Google so that it no longer appears in search results.
You may find yourself in one of these situations:
- You’re not interested in that content appearing in Google.
- It is a page created by malicious software
- It’s a test page or empty of content
- You want to remove obsolete content and it is not going to be updated anymore
- Confidential data of the website owner appears by mistake
- The page contains adult content
- Or any other case.
You need that page not to appear in search results and here we will explain how to remove URLs from your website that you no longer want to be in Google’s index.
Using robots meta tag with noindex
The first option to remove a URL in Google is to go to the source code of the page you want to delete and add the robots meta tag with the noindex value.
That is, once you are in the source code of the page, you have to look for the tag. It is in that part of the code where you have to include the following line:
<meta name="robots" content="noindex">
If you analyze the semantics of the code, you will see that you are telling the search robots not to index that page when they reach it.
Thus, the URL will be removed from Google’s index and will no longer appear in search results.
Before including it, you should check that the code is not already on the site. It is possible that this meta element is already included in the page but with an index value. In that case, it is as easy as replacing it with noindex.
Remember that this change not only affects Google’s robot but also those of other search engines such as, for example, Bing or Yahoo.
However, you must take into account that this is not an immediate process, especially if that URL already appeared in Google. If it is newly created, at the time of adding the noindex it will no longer be indexed by search robots.
In addition, the noindex is not really a command given to Google through the source code of the website, but rather an indication. This means that Google, when it reaches it, may or may not follow it.
Remove URL with Google Search Console
If you have your website registered with Google Search Console (the old Google Webmasters Tool) you can also easily remove a URL from Google’s index from there.
See the “URL Removal” section in the left menu and click on the “Temporary removals” tab.
This means that for a period of six months you can block a URL, removing the snippets and the cached version.
However, Google may re-crawl the URL through internal or external links, for example, so it could be re-indexed.
Once in that section, you have to click on “New request” and then type the URL to be removed and hit continue. You would just have to wait for Google to remove the page.
This action will affect both http and https versions regardless of whether they start with www or not. It is important that you know this so that you do not send requests for each of the URL versions: entering mypage.com will be enough.
You have the option to check on the Google Search Console page itself the status of your URL removal request so you know if you need to take any further action or not.
There you will also see the history of removal requests for all the URLs you want to remove and you can open a new request if you need to.
Note that, using this option, this removal will most likely be only temporary and not definitive as the tool itself warns you at the time of making the request, so you should check it from time to time.
The reason is that the URL takes about 90 days to be removed from Google’s search results, but if in the course of that time the URL is again included by links pointing to it, for example, it will appear in the search engine results again through the keywords you have positioned.
Google Search Console or robots.txt file?
Although people usually use the Google Search Console option for fear of touching the code, the robots meta element method is usually more effective because it is the robot itself that reads the order not to index that URL even if there are links pointing to it.
In fact, it is advisable to use Google Search Console when it is an urgent situation such as, for example, that you have mistakenly published a page with personal and confidential data. If you are not in a hurry for the URL to disappear, it is more effective to use one of the other methods.
Removing a URL with robots.txt
The robots.txt file is used to tell search robots what information they can request from a website and what information they cannot. In this way, the access of the robot that accesses the website and its behavior within it is managed.
Through this file and by means of the Disallow
directive we can deindex a URL in Google and also other search engines.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/onepage.html
Disallow: /img/*/*/originals/
To do this, we just have to open the Robots.txt file and add Disallow:/ and then the URL we want to remove. We will have to make a different line for each one of them until we have them all.
In fact, what we do with this is to prohibit the access to Google and the rest of robots, so they will not be able to index it.
To do this it is essential that the URL also has the noindex meta tag and that it is already deindexed. If the URL is indexed and we block it by robots.txt, Google will not be able to access it from that moment on but will keep it indexed.
When finished, save the file and upload it back to the root of the web directory. When Google gets to it again, it will know which URLs to ignore. With this method it is estimated that it takes Google 24 to 48 hours to remove the URL from the index.
How long does it take Google to remove a URL?
In general, it is estimated that Google can take between 3 and 24 hours to remove a URL from its search results. However, as we have seen above, it actually temporarily removes the URL from 90 days to 6 months depending on the method used.
Once that time has passed, it may reappear in the search results if it has links pointing to it from other websites.
Therefore, to get the URL removed for good, it is best to proceed with redirects. That way, no external link will point to the URL we want to delete.
Mass URL removal in Google
It may happen that you want to delete several URLs at once. If there are too many of them, either of the two processes described above can become a real hassle, but there is a way to do it more easily.
The only solution (I know so far) is to install a Google Chrome extension called Google Webmaster Tool Bulk URL Removal from Github.
Once it is unzipped, a button will appear in Google Search Console that will give us the option to load a file with all the URLs we want to remove within the section we have seen before.
As it is adviced in some posts which speaks about this extension, you have to be very cautious with the file you provide, because any error in it (a blank space, for example) can cause the unidexing of the whole site in Google for at least 90 days.
(IMAGE OF SEARCH CONSOLE)
This file must have each URL in a different line and the extension .txt. The most advisable is to do it in a notepad since it is saved by default with that same extension.
When the file has been loaded, Google Search Console will start sending removal requests automatically, one for each URL we have written. We will only have to leave the browser open until it has finished blocking the URLs.
The redirections
Once you have removed the unwanted URLs from your website, you need to make some redirects so that everything is well structured on the page.
Although there are several options, here are two:
- 301 redirect if there is a new URL that has similar content to the one that has been removed.
- Redirect 410 if there is no URL to take the place of the deleted URL.
In both cases, the search engine will understand that this URL no longer exists and will index the new one in the first case, and remove the URL permanently in the second, without any negative effect, as it will forget about it.
Do not confuse the 410 code with the 404 page not found code. While the former indicates to the search engine that the URL no longer exists and therefore should be forgotten, the latter indicates that the search engine has found an error on your website because it has reached a URL that does not exist.
Of course, you will have to be a little patient for these changes to take place, as you have to wait for the Google search robot to pass by your website again to detect them and remove the content.
You may read my post “htaccess to redirect a url, file, or folder” to get further information about how to setup redirections in an htaccess file.
Conclusion
It is highly recommended to make periodic reviews of the status of the URLs of our website, since errors such as links that do not work, content that is not positioning or the existence of pages without utility directly affect the SEO.
There are several online tools that can be used to determine if there are problems with the URLs of a web page. Among them, one of the best you can consult is Google Search Console (formerly known as Webmasters Tools). If you find something wrong, you should fix it as soon as possible to avoid penalties.
In addition, a website with URLs that do not work or are not useful, harms the user experience, so not only will we be losing someone who will not return to our website, but their short stay on it, will lower its SEO.
Therefore, it is convenient not only to check the status of the pages of our project, but also to know how to react and correct it in time so that the web is not affected.
Do not be afraid of deleting a stored URL that may be damaging the positioning.
In summary, the actions to be taken would be:
- add the robots noindex meta tag in the source code.
- make the appropriate redirects for each URL deleted.
- send the URL removal request through Google Search Console.
With these simple steps we will achieve that Google does not index content that we do not want in addition to clean our web project and prevent such errors affect their positioning in search results.
Leave a Reply